text
Below is a visual that shows the type of study we originally wanted to conduct. Their data is not open source and because finding data of this type was not possible we changed the data and remainder of the project.
We can see a rough answer to our original question, “When is the cheapest time to buy airline tickets? Does pricing change significantly as demand for flights varies? Do airlines vary price to combat strategic consumers?”… Yes ticket price is extremely elastic.
Cheapair.com
Data Source: Source
Higher cost airlines are inclined to determine which market consumers belong to, tourism or business.
Last minute deals are more than offset by than increased prices leading up to them.
Price discrimination while difficult for airlines to execute still occurs.
In summary of the first two points, studies find that airlines should charge lower fares to tourists buying tickets one to two periods in advance, charge higher fares to businessmen the period of a flight, and cut costs on the day of a flight, in order to maximize their profits.
We find evidence of airline discrimination in other studies, some of which show that airline tickets will have higher fares for weekday flights than weekend flights in an attempt to price discriminate against people going on business trips.
All data was obtain via the US Federal Aviation Database systems. Source
Due to the size of the data this document is created with code that randomly samples 20,000 points from our 11 million, as a result the charts below sometimes vary thus we refrained from interpreting them. Although the full data set is represented by the tabular outputs. Additionally when viewing regression outputs we were able to code in references to outputs as the values change while interpretations are static. The patterns and relationships observed remain constant across samples due to the strength and significance of our variables as well as the large sample size
| Variable Name | Description |
|---|---|
| QUARTER | Quarter (1-4) |
| ROUNDTRIP | Round Trip Indicator (1=Yes) |
| ITIN_YIELD | Itinerary Fare Per Miles Flown in Dollars (ITIN_FARE/MilesFlown). |
| PASSENGERS | Number of Passengers |
| ITIN_FARE | Itinerary Fare Per Person |
| DISTANCE_GROUP | Distance Group, in 500 Mile Intervals |
| MILES_FLOWN | Itinerary Miles Flown (Track Miles) |
| ITIN_GEO_TYPE | Itinerary Geography Type, 0 = Contiguous Domestic (Lower 48 U.S. States Only) , 1 = Non-contiguous Domestic (Includes Hawaii, Alaska and Territories) |
tabl3 <-"
| Transformed Variable Name | Original Variable Name | Description |
|--------------------|:---------------:|:--------------------------:|
| lPASSENGERS | PASSENGERS | Log(PASSENGERS) |
| SQRT_1over_DG_x_MF | DISTANCE_GROUP & MILES_FLOWN | $\\sqrt{\\frac{1}{\\text{DISTANCE_GROUP} * \\text{MILES_FLOWN}}}$ |
"
tabl3 %>% pander()
| Transformed Variable Name | Original Variable Name | Description |
|---|---|---|
| lPASSENGERS | PASSENGERS | Log(PASSENGERS) |
| SQRT_1over_DG_x_MF | DISTANCE_GROUP & MILES_FLOWN | \(\sqrt{\frac{1}{\text{DISTANCE_GROUP} * \text{MILES_FLOWN}}}\) |
ggplot(samp %>% drop_na()) +
geom_smooth(aes(x= lPASSENGERS, y=ITIN_YIELD, col=ROUNDTRIP)) +
#geom_jitter(aes(x= lPASSENGERS, y=ITIN_YIELD, col=ROUNDTRIP), alpha= 0.05) +
facet_grid(rows= ~ITIN_GEO_TYPE)+
theme_bw()+
labs(col = "Flight Type", title= "Yield by log(Passengers)")+
xlab("Log(Passengers)") + ylab("Fare per mile per passenger (Dollars)")
ggplot(samp %>% drop_na()) +
#geom_jitter(aes(x= DISTANCE_GROUP, y=ITIN_YIELD, col=ROUNDTRIP), alpha= 0.0075) +
geom_smooth(aes(x= DISTANCE_GROUP, y=ITIN_YIELD, col=ROUNDTRIP)) +
facet_grid(rows= ~ITIN_GEO_TYPE)+
theme_bw()+
theme(
panel.spacing = unit(0.5, "lines")
)+
labs(col = "Flight Type", title= "Yield by Distance")+
xlab("Distance in intervals of 500") + ylab("Fare per mile per passenger (Dollars)")
ggplot(data=samp, ) +
geom_histogram(aes(x=ITIN_YIELD, fill= ROUNDTRIP)) +
#geom_area(aes(x=HEPerGDP,y=child_mort, fill= continent))+
theme_bw() +
gghighlight(use_direct_label = FALSE) +
facet_wrap(~ITIN_GEO_TYPE) +
theme(
panel.spacing = unit(0.5, "lines"),
axis.ticks.x=element_blank()
)+
labs(fill = "Flight Type", title= "Distribution of Yields by Flight Types")+
xlab("Fare per mile per passenger (Dollars)") + ylab("Count")
pander(favstats(ITIN_YIELD ~ ROUNDTRIP + ITIN_GEO_TYPE, data=FullDat_Filt)[c("ROUNDTRIP.ITIN_GEO_TYPE", "Q1","median", "mean","Q3", "sd","n")], caption= "Summary table of Yields by Flight Type per Quarter")
| ROUNDTRIP.ITIN_GEO_TYPE | Q1 | median | mean | Q3 | sd |
|---|---|---|---|---|---|
| One-Way.Continguous Domestic | 0.1047 | 0.1738 | 0.2354 | 0.2908 | 0.2091 |
| RoundTrip.Continguous Domestic | 0.1015 | 0.1599 | 0.2063 | 0.2571 | 0.1657 |
| One-Way.Non-Continguous Domestic | 0.0709 | 0.1014 | 0.1423 | 0.1586 | 0.148 |
| RoundTrip.Non-Continguous Domestic | 0.0681 | 0.0942 | 0.1246 | 0.1337 | 0.1322 |
| n |
|---|
| 4025439 |
| 5803943 |
| 338790 |
| 438735 |
pander(favstats(ITIN_YIELD ~ DISTANCE_GROUP, data=FullDat_Filt)[c(1:5, 12:16, 23:25),c("DISTANCE_GROUP", "Q1","median", "mean","Q3", "sd","n")], caption= "Summary table of Yields by Flight Type per Quarter")
| DISTANCE_GROUP | Q1 | median | mean | Q3 | sd | n | |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 0.3347 | 0.5259 | 0.6113 | 0.8075 | 0.3752 | 420249 |
| 2 | 2 | 0.1809 | 0.2847 | 0.3348 | 0.4358 | 0.2166 | 1675504 |
| 3 | 3 | 0.1301 | 0.2045 | 0.2357 | 0.3043 | 0.1472 | 1941759 |
| 4 | 4 | 0.1071 | 0.1659 | 0.1874 | 0.2405 | 0.1128 | 1759498 |
| 5 | 5 | 0.0901 | 0.136 | 0.1561 | 0.1984 | 0.09458 | 1631197 |
| 12 | 12 | 0.0601 | 0.0841 | 0.09356 | 0.1158 | 0.048 | 74074 |
| 13 | 13 | 0.0595 | 0.0819 | 0.08936 | 0.1104 | 0.04411 | 30455 |
| 14 | 14 | 0.0613 | 0.0829 | 0.09038 | 0.1134 | 0.04188 | 26418 |
| 15 | 15 | 0.0572 | 0.079 | 0.08533 | 0.1071 | 0.0393 | 14473 |
| 16 | 16 | 0.0611 | 0.0798 | 0.08534 | 0.1036 | 0.03466 | 22320 |
| 23 | 23 | 0.0552 | 0.06725 | 0.07131 | 0.0857 | 0.02555 | 258 |
| 24 | 24 | 0.0568 | 0.0681 | 0.07164 | 0.08505 | 0.02991 | 131 |
| 25 | 25 | 0.0631 | 0.0869 | 0.07687 | 0.1041 | 0.03513 | 377 |
Two regressions were created during our attempts to better understand the data and the relationships between our variables. The first uses at most simple transformations such as logs to help reduce heteroskedasticity. While the second employs more abstract calculus transformations in order to linearize any variable previously used that did not initially hold a simple linear pattern with our endogenous variable.
Variables
panel.cor <- function(x, y, digits=2, prefix="", cex.cor)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- abs(cor(x, y))
txt <- format(c(r, 0.123456789), digits=digits)[1]
txt <- paste(prefix, txt, sep="")
if(missing(cex.cor)) cex <- 0.8/strwidth(txt)
test <- cor.test(x,y)
# borrowed from printCoefmat
Signif <- symnum(test$p.value, corr = FALSE, na = FALSE,
cutpoints = c(0, 0.001, 0.01, 0.05, 0.1, 1),
symbols = c("***", "**", "*", ".", " "))
text(0.5, 0.5, txt, cex = 1.5 )
text(.7, .8, Signif, cex=cex, col=2)
}
pairs(samp, lower.panel=panel.smooth, upper.panel=panel.cor)
\[ \underbrace{Y_i}_\text{Itinerary Yield} \underbrace{=}_{\sim} \overbrace{\beta_0}^{\stackrel{\text{y-int}}{\text{Base Yield}}} + \overbrace{\beta_1}^{\stackrel{\text{slope along}}{\text{lPassenger}}} \underbrace{X_{1i}}_\text{lPassenger} + \overbrace{\beta_2}^{\stackrel{\text{change in}}{\text{y-int}}} \underbrace{X_{2i}}_\text{Distance Group} + \overbrace{\beta_3}^{\stackrel{\text{change in}}{\text{y-int}}} \underbrace{X_{3i}}_\text{Roundtrip} + \overbrace{\beta_4}^{\stackrel{\text{change in}}{\text{y-int}}} \underbrace{X_{4i}}_\text{Non-Continguous} +\overbrace{\beta_5}^{\stackrel{\text{change in}}{\text{slope}}} \underbrace{X_{1i}X_{2i}}_\text{lPassenger:Distance Group} + \epsilon_i \]
lm1 <- lm(ITIN_YIELD ~ lPASSENGERS + DISTANCE_GROUP + ROUNDTRIP + ITIN_GEO_TYPE + lPASSENGERS:DISTANCE_GROUP , data= samp)
summary(lm1) %>% pander
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 0.3559 | 0.002609 | 136.4 | 0 |
| lPASSENGERS | -0.03338 | 0.003671 | -9.092 | 1.063e-19 |
| DISTANCE_GROUP | -0.03484 | 0.0005114 | -68.12 | 0 |
| ROUNDTRIPRoundTrip | 0.05373 | 0.002601 | 20.66 | 8.305e-94 |
| ITIN_GEO_TYPENon-Continguous Domestic | 0.08365 | 0.005044 | 16.58 | 2.355e-61 |
| lPASSENGERS:DISTANCE_GROUP | -0.002986 | 0.0008062 | -3.704 | 0.0002126 |
| Observations | Residual Std. Error | \(R^2\) | Adjusted \(R^2\) |
|---|---|---|---|
| 20000 | 0.1625 | 0.2265 | 0.2263 |
lm1_r2 <- round(summary(lm1)$adj.r.squared, 2)
lm1_RSE <- round(sigma(lm1)*100, 1)
matrix_coef <- summary(lm1)$coefficients
my_estimates <- matrix_coef[ , 1]
b0 <- round(my_estimates[1]*100, 2)
b1 <- round(my_estimates[2]*100, 2)
b2 <- round(my_estimates[3]*100, 2)
b3 <- round(my_estimates[4]*100, 2)
b4 <- round(my_estimates[5]*100, 2)
b5 <- round(my_estimates[6], 2)
matrix_coef %>% pander(caption= "Results")
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 0.3559 | 0.002609 | 136.4 | 0 |
| lPASSENGERS | -0.03338 | 0.003671 | -9.092 | 1.063e-19 |
| DISTANCE_GROUP | -0.03484 | 0.0005114 | -68.12 | 0 |
| ROUNDTRIPRoundTrip | 0.05373 | 0.002601 | 20.66 | 8.305e-94 |
| ITIN_GEO_TYPENon-Continguous Domestic | 0.08365 | 0.005044 | 16.58 | 2.355e-61 |
| lPASSENGERS:DISTANCE_GROUP | -0.002986 | 0.0008062 | -3.704 | 0.0002126 |
Our initial regression model using ordinary least squares results in an \(R^2\) of 0.23, which in the scope of our data is fairly substantial, airline pricing is incredibly varies and involved hundreds of possible factors, we have access to a very limited number of factors and thus are only able to account for total variation to a very limited extent. Though, our residual Standard error is less than ideal when taken in context, an error of 16.3 cents in yields is a large percentage of our total yield range ($0.05-$2), 0.08% of our total range to be specific.
Skipping the y-intercept as its interpretation would make little realistic sense in this case, specific coefficient interpretations are as follows;
For every 1% increase in itinerary passengers we see a decline in yield of -3.34 cents
For every 500 additional miles on an Itinerary we see a -3.48 cent decline in yield.
Roundtrip flights on average provide an additional 5.37 cent yield.
Domestic (Non-Continguous) flights on average yield 8.36 cents more per mile.
For each 1% increase in passenger count we see a 0 decline in the distance of a flight.
As the data is not a time series we limited our testing to only Heteroskedasticity and multi-collinearity.
Below are the results from a Breush-Pagan Test:
bptest(lm1)
##
## studentized Breusch-Pagan test
##
## data: lm1
## BP = 647.06, df = 5, p-value < 2.2e-16
Despite the transformations made on passengers, significant error variance is still present. This is likely due to the increasing variability over increasing X as well as miss-specification errors due to omitting significant variables.
Due to concerns about high correlation between our variables we tested for Multi-collinearity as well:
vif(lm1)
## lPASSENGERS DISTANCE_GROUP
## 3.909970 1.663725
## ROUNDTRIP ITIN_GEO_TYPE
## 1.245640 1.288492
## lPASSENGERS:DISTANCE_GROUP
## 3.848542
As none of our values are greater than 10 we should not be worried about multi-collinearity.
In order to allow for a true BLUE regression we calculated the coefficients using robust least squares. As shown below the skeleton of the model remains the same though the methods used to calculate coefficients now apply a weighting system assigning less weight to outlying points than standard OLS.
\[ \underbrace{Y_i}_\text{Itinerary Yield} \underbrace{=}_{\sim} \overbrace{\beta_0}^{\stackrel{\text{y-int}}{\text{Base Yield}}} + \overbrace{\beta_1}^{\stackrel{\text{slope along}}{\text{lPassenger}}} \underbrace{X_{1i}}_\text{lPassenger} + \overbrace{\beta_2}^{\stackrel{\text{change in}}{\text{y-int}}} \underbrace{X_{2i}}_\text{Distance Group} + \overbrace{\beta_3}^{\stackrel{\text{change in}}{\text{y-int}}} \underbrace{X_{3i}}_\text{Roundtrip} + \overbrace{\beta_4}^{\stackrel{\text{change in}}{\text{y-int}}} \underbrace{X_{4i}}_\text{Non-Continguous} +\overbrace{\beta_5}^{\stackrel{\text{change in}}{\text{slope}}} \underbrace{X_{1i}X_{2i}}_\text{lPassenger:Distance Group} + \epsilon_i \]
As shown in the output below the relationships of our exogenous variables to our endogenous variable yield remain the same although the degree to which each of these variables affects the yield has somewhat shifted.
coeftest(lm1, vcov = vcovHC(lm1, type= 'HC1'))
##
## t test of coefficients:
##
## Estimate Std. Error t value
## (Intercept) 0.35592279 0.00361278 98.5177
## lPASSENGERS -0.03337920 0.00463428 -7.2027
## DISTANCE_GROUP -0.03483639 0.00065336 -53.3188
## ROUNDTRIPRoundTrip 0.05373388 0.00279555 19.2212
## ITIN_GEO_TYPENon-Continguous Domestic 0.08364814 0.00531294 15.7442
## lPASSENGERS:DISTANCE_GROUP -0.00298634 0.00093041 -3.2097
## Pr(>|t|)
## (Intercept) < 2.2e-16 ***
## lPASSENGERS 6.114e-13 ***
## DISTANCE_GROUP < 2.2e-16 ***
## ROUNDTRIPRoundTrip < 2.2e-16 ***
## ITIN_GEO_TYPENon-Continguous Domestic < 2.2e-16 ***
## lPASSENGERS:DISTANCE_GROUP 0.001331 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In addition the the simple robust estimates, due to the extremity of our Breush-Pagan results we felt it would also be useful to calculate 95% confidence intervals for our estimators and be doubly sure that they remained interpretable and useful. As shown all estimates retain the same signs and are thus safe to include and utilize in a model.
coefci(lm1, vcov = vcovHC(lm1, type= 'HC1'))
## 2.5 % 97.5 %
## (Intercept) 0.348841448 0.363004136
## lPASSENGERS -0.042462776 -0.024295627
## DISTANCE_GROUP -0.036117028 -0.033555749
## ROUNDTRIPRoundTrip 0.048254375 0.059213388
## ITIN_GEO_TYPENon-Continguous Domestic 0.073234335 0.094061939
## lPASSENGERS:DISTANCE_GROUP -0.004810024 -0.001162649
In this transformed model the non-simple linear relationship between distance group, miles flown and yields was transformed into a simple linear relationship, refer to variable overview. The implications of this are further expanded upon below.
\[ \underbrace{Y_i}_\text{Itinerary Yield} \underbrace{=}_{\sim} \overbrace{\beta_0}^{\stackrel{\text{y-int}}{\text{Base Yield}}} + \overbrace{\beta_1}^{\stackrel{\text{slope along}}{\text{lPassenger}}} \underbrace{X_{1i}}_\text{lPassenger} + \overbrace{\beta_2}^{\stackrel{\text{change in}}{\text{y-int}}} \underbrace{X_{2i}}_\text{SQRT_1over_DG_x_MF} + \overbrace{\beta_3}^{\stackrel{\text{change in}}{\text{y-int}}} \underbrace{X_{3i}}_\text{Roundtrip} + \overbrace{\beta_4}^{\stackrel{\text{change in}}{\text{y-int}}} \underbrace{X_{4i}}_\text{Non-Continguous} +\overbrace{\beta_5}^{\stackrel{\text{change in}}{\text{slope}}} \underbrace{X_{1i}X_{2i}}_\text{lPassenger:SQRT_1over_DG_x_MF} + \epsilon_i \]
So as to best maintain the ability to compare the two regression all variables where kept the same except for the replacement of Distance_Group with the new transformed variable.
lm2 <- lm(ITIN_YIELD ~ lPASSENGERS + SQRT_1over_DG_x_MF + ROUNDTRIP + ITIN_GEO_TYPE + lPASSENGERS:SQRT_1over_DG_x_MF, data= samp)
summary(lm2) %>% pander
| Estimate | Std. Error | t value | |
|---|---|---|---|
| (Intercept) | -0.0002572 | 0.00261 | -0.09854 |
| lPASSENGERS | -0.02186 | 0.002658 | -8.224 |
| SQRT_1over_DG_x_MF | 12.88 | 0.1116 | 115.4 |
| ROUNDTRIPRoundTrip | 0.068 | 0.002126 | 31.98 |
| ITIN_GEO_TYPENon-Continguous Domestic | 0.006291 | 0.003839 | 1.639 |
| lPASSENGERS:SQRT_1over_DG_x_MF | -1.704 | 0.12 | -14.2 |
| Pr(>|t|) | |
|---|---|
| (Intercept) | 0.9215 |
| lPASSENGERS | 2.095e-16 |
| SQRT_1over_DG_x_MF | 0 |
| ROUNDTRIPRoundTrip | 6.71e-219 |
| ITIN_GEO_TYPENon-Continguous Domestic | 0.1013 |
| lPASSENGERS:SQRT_1over_DG_x_MF | 1.475e-45 |
| Observations | Residual Std. Error | \(R^2\) | Adjusted \(R^2\) |
|---|---|---|---|
| 20000 | 0.1375 | 0.4461 | 0.446 |
lm2_r2 <- round(summary(lm2)$adj.r.squared, 2)
lm2_RSE <- round(sigma(lm1)*100, 1)
matrix_coef <- summary(lm2)$coefficients
my_estimates <- matrix_coef[ , 1]
b0 <- round(my_estimates[1]*100, 2)
b1 <- round(my_estimates[2]*100, 2)
b2 <- round(my_estimates[3], 2)
b3 <- round(my_estimates[4]*100, 2)
b4 <- round(my_estimates[5]*100, 2)
b5 <- round(my_estimates[6], 2)
Our transformed regression model using ordinary least squares results in an \(R^2\) of 0.45, which in the scope of our data is fairly substantial, airline pricing is incredibly varies and involved hundreds of possible factors, we have access to a very limited number of factors and thus are only able to account for total variation to a very limited extent. Though, our residual Standard error is less than ideal when taken in context, an error of 16.3 cents in yields is a large percentage of our total yield range ($0.05-$2), 0.08% of our total range to be specific. The Primary issue with this is that we lose the ability to effectively interpret a change in distance due to the complexity of the transformation.
Skipping the y-intercept as its interpretation would make little realistic sense in this case, specific coefficient interpretations are as follows;
For every 1% increase in itinerary passengers we see a decline in yield of -2.19 cents
For every 1 unit increase in \(\text{(Miles Flown * Distance group)}^{-\frac{1}{2}}\) on an Itinerary we see a 12.88 dollar increase in yield.
Roundtrip flights on average provide an additional 6.8 cent yield.
Domestic (Non-Continguous) flights on average yield 0.63 cents more per mile, but are no longer significant.
For each 1% increase in passenger count we see a -1.7 unit decline in \(\text{(Miles Flown * Distance group)}^{-\frac{1}{2}}\) of a flight.
As the data is not a time series we limited our testing to only Heteroskedasticity and multi-collinearity.
bptest(lm2)
##
## studentized Breusch-Pagan test
##
## data: lm2
## BP = 2858.2, df = 5, p-value < 2.2e-16
Despite the transformations made on passengers and the attempt to linearize Distance, significant error variance is still present, in this case even more so than before. This is likely due to the increasing variability over increasing X as well as miss-specification errors due to omitting significant variables.
Due to concerns about high correlation between our variables we tested for Multi-collinearity as well:
vif(lm2)
## lPASSENGERS SQRT_1over_DG_x_MF
## 2.863075 1.530031
## ROUNDTRIP ITIN_GEO_TYPE
## 1.162086 1.042313
## lPASSENGERS:SQRT_1over_DG_x_MF
## 3.347589
As none of our values are greater than 10 we should not be worried about multi-collinearity.
Again due to the issues found in our assumptions we calculated Robust standard errors to use rather than traditional OLS.
As shown in the output below the relationships of our exogenous variables to our endogenous variable yield remain the same although the degree to which each of these variables affects the yield has somewhat shifted.
coeftest(lm2, vcov = vcovHC(lm2, type= 'HC1'))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.00025718 0.00384553 -0.0669 0.94668
## lPASSENGERS -0.02186187 0.00317125 -6.8938 5.595e-12
## SQRT_1over_DG_x_MF 12.87795718 0.23807526 54.0920 < 2.2e-16
## ROUNDTRIPRoundTrip 0.06799543 0.00231664 29.3509 < 2.2e-16
## ITIN_GEO_TYPENon-Continguous Domestic 0.00629125 0.00318185 1.9772 0.04803
## lPASSENGERS:SQRT_1over_DG_x_MF -1.70420573 0.21591821 -7.8928 3.105e-15
##
## (Intercept)
## lPASSENGERS ***
## SQRT_1over_DG_x_MF ***
## ROUNDTRIPRoundTrip ***
## ITIN_GEO_TYPENon-Continguous Domestic *
## lPASSENGERS:SQRT_1over_DG_x_MF ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Again, in addition the the simple robust estimates, due to the extremity of our Breush-Pagan results we felt it would also be useful to calculate 95% confidence intervals for our estimators and be doubly sure that they remained interpretable and useful. As shown all estimates retain the same signs and are thus safe to include and utilize in a model with the exception of our intercept and geography types.
coefci(lm2, vcov = vcovHC(lm2, type= 'HC1'))
## 2.5 % 97.5 %
## (Intercept) -7.794734e-03 0.007280367
## lPASSENGERS -2.807777e-02 -0.015645967
## SQRT_1over_DG_x_MF 1.241131e+01 13.344604358
## ROUNDTRIPRoundTrip 6.345462e-02 0.072536232
## ITIN_GEO_TYPENon-Continguous Domestic 5.456693e-05 0.012527936
## lPASSENGERS:SQRT_1over_DG_x_MF -2.127423e+00 -1.280988205
#Graph Resolution (more important for more complex shapes)
graph_reso <- 0.025
#Setup Axis
axis_x <- seq(min(samp$DISTANCE_GROUP), max(samp$DISTANCE_GROUP), by = graph_reso)
axis_y <- seq(min(samp$lPASSENGERS), max(samp$lPASSENGERS), by = graph_reso)
axis_col <- as.factor(c("One-Way", "RoundTrip"))
axis_f <- as.factor(c("Continguous Domestic", "Non-Continguous Domestic"))
#Sample points
lmnew <- expand.grid(DISTANCE_GROUP = axis_x, lPASSENGERS = axis_y, ROUNDTRIP = axis_col, ITIN_GEO_TYPE = axis_f , KEEP.OUT.ATTRS=F)
lmnew$Z <- predict.lm(lm1, newdata = lmnew)
lmnew <- acast(lmnew, lPASSENGERS ~ DISTANCE_GROUP , value.var = "Z") #y ~ x
samp %>%
filter(ITIN_GEO_TYPE == "Continguous Domestic") %>%
plot_ly(.,
x = ~DISTANCE_GROUP,
y = ~lPASSENGERS,
z = ~ITIN_YIELD,
#text = rownames(samp %>% drop_na()),
type = "scatter3d",
mode ="markers",
color = ~as.factor(ROUNDTRIP),
alpha= 0.7) %>%
layout(title= list(text = "Continguous Domestic Flights (Lower 48)"))
samp %>%
filter(ITIN_GEO_TYPE == "Non-Continguous Domestic") %>%
plot_ly(.,
x = ~DISTANCE_GROUP,
y = ~lPASSENGERS,
z = ~ITIN_YIELD,
#text = rownames(samp %>% drop_na()),
type = "scatter3d",
mode ="markers",
color = ~as.factor(ROUNDTRIP),
alpha= 0.7) %>%
layout(title= list(text = "Non-Continguous Domestic Flights (Outside Lower 48)"))
Rough idea ->
increasing passengers does lead to decreasing profits, likely through the assumed discounts that occur from bulk purchasing.
increasing distances also lead to reducing profits as the flight lasts longer. this is likely related to fixed costs as a percentage of total costs
lastly when comparing one-way vs round trips we see that continguous flights are more likely to provide greater profits on one-way flights relative to non-continguous flights